.. code:: ipython3

    import pandas as pd
    
    from seeq import spy
    
    # Set the compatibility option so that you maximize the chance that SPy will remain compatible with your notebook/script
    spy.options.compatibility = 192

Parameterized Jobs
==================

The simple scheduling methods described in
:doc:`spy.jobs <../spy.jobs>` will often be adequate for your
purposes.

But in some scenarios, you may wish to run a suite of jobs across an
asset group or some other set of items. For this you will use the
``spy.jobs.push()`` command.

**This feature is only available for scheduling notebooks in Seeq Data
Lab. You cannot use SPy to schedule content in Anaconda, AWS SageMaker,
or any other Python environment.**

Assemble a DataFrame with the Parameters
----------------------------------------

Let’s take the most common example, which is to schedule a series of
jobs across a group of assets.

Search for the assets:

.. code:: ipython3

    schedule_df = spy.search({
        'Path': 'Example >> Cooling Tower 1',
        'Type': 'Asset'
    })
    schedule_df

Now add a ``Schedule`` column, which will dictate how often the script
will run.

For intervals more frequent than 1 hour, it is highly recommended that
you use intervals for which an hour is cleanly divisible like ‘15
minutes’, ‘20 minutes’ or ‘30 minutes’.

.. code:: ipython3

    schedule_df['Schedule'] = 'every 6 hours'
    schedule_df

You can also use Quartz Cron expressions in place of the natural
language phrasing above by using the `Online Cron Expression
Generator <https://www.freeformatter.com/cron-expression-generator-quartz.html>`__.
As an example, the equivalent Quartz Cron expression for “every 6 hours”
is ``0 0 0/6 ? * * *``.

Sort your Schedule DataFrame
----------------------------

It’s important to sort the DataFrame so that the ordering of the items
is not dependent on how the constituent data happened to be returned by
Seeq or any other data source.

.. code:: ipython3

    # If you have an ID column, it's easiest to sort by that. Otherwise pick something that
    # will result in consistent ordering
    schedule_df.sort_values('ID', inplace=True, ignore_index=True)

Push the jobs to Seeq
---------------------

The final step is to push the schedule DataFrame to Seeq so that it can
schedule the jobs.

It’s often desirable to “spread out” the execution of the jobs so that
they don’t all execute simultaneously. In this example, we’re executing
the jobs every 6 hours and we’ve asked ``spy.jobs.push()`` to spread
them out evenly over those 6 hours. (In general, the ``spread``
parameter is the same as the frequency of your schedule since you want
all the jobs to execute within the time interval allocated.)

Execute the following cell (only) to schedule the set of jobs.

.. code:: ipython3

    parameters = spy.jobs.push(schedule_df, spread='6 hours', interactive_index=1)

If you are a Seeq administrator, you can view these jobs by going to the
*Administration* page and clicking on the *Jobs* tab. You will need to
clear the *Groups* filter to see the Notebook jobs.

In the output of the cell above, you’ll notice that the current context
is **INTERACTIVE**, which is the term we use for the scenario where you
are executing cells in the workbook yourself via the Seeq Data Lab user
interface. When you open an HTML file in the ``_Job Results`` folder,
you’ll see that the same cell shows the current context as **JOB**.

In the JOB context, ``parameters`` will be the row of the DataFrame that
pertains to that job instance. In the INTERACTIVE context,
``parameters`` will be the row that corresponds to
``interactive_index``.

**We unschedule the jobs here so that your Seeq Data Lab isn’t loaded
down with executing this tutorial.**

.. code:: ipython3

    spy.jobs.unschedule()

Do something cool
-----------------

Now, based on the parameters in ``parameters``, you can do something
interesting. In this example we’ll push a condition to a new (small)
asset tree.

.. code:: ipython3

    parameters

Let’s pretend that we have a spiffy algorithm that can determine the
health of our asset by looking at a couple of signals.

.. code:: ipython3

    health_data_df = spy.pull(spy.search({
        'Asset': parameters['ID'],
        'Name': 'Temperature'
    }), header='Name')
    
    health_indicator = health_data_df.mean()['Temperature']
    health_status = 'HEALTHY' if health_indicator > 80 else 'UNHEALTHY'

.. code:: ipython3

    metadata_df = pd.DataFrame([{
        'Path': 'Parameterized Jobs Tutorial',
        'Asset': f'{parameters["Name"]}',
        'Name': 'Job Executions',
        'Type': 'Condition',
        'Maximum Duration': '1h'
    }])
    metadata_df

.. code:: ipython3

    import datetime
    
    start = datetime.datetime.now().isoformat()
    end = (datetime.datetime.now() + datetime.timedelta(minutes=5)).isoformat()
    capsule_data = pd.DataFrame([{
        'Capsule Start': pd.to_datetime(start),
        'Capsule End': pd.to_datetime(end),
        'Health': health_status
    }])
    capsule_data

.. code:: ipython3

    spy.push(capsule_data, metadata=metadata_df)

Scheduling from a separate notebook
-----------------------------------

The ``spy.jobs.push()`` function accepts a ``datalab_notebook_url``
parameter, so that a job can be pushed to another notebook to which you
have access. A common use case for this would be to enable a user of an
Add-on Mode notebook to configure a scheduled notebook through form
input. In such a scenario, the parameters specified by completion of the
form would need to be passed to the scheduled notebook.

.. code:: ipython3

    path_to_here = '/notebooks/SPy%20Documentation/Advanced%20Scheduling/Parameterized%20Jobs.ipynb'
    this_notebook_url = f'{spy.utils.get_data_lab_project_url()}{path_to_here}'
    spy.jobs.push(schedule_df, spread='6 hours', datalab_notebook_url=this_notebook_url)

No additional work is needed to ensure the parameters are available in
the target Notebook. The ``schedule_df`` used in the call to
``spy.jobs.push()`` is automatically pickled to a .pkl file in the
``_Job DataFrames`` folder of the Notebook being scheduled. To retrieve
the parameters for a specific job in the jobs DataFrame from the
scheduled Notebook, just call ``spy.jobs.pull()``:

.. code:: ipython3

    parameters = spy.jobs.pull(interactive_index=1)
    parameters

The **JOB** and **INTERACTIVE** contexts still apply as described
earlier in this tutorial. Use the ``interactive_index`` to control which
row is returned by ``spy.jobs.pull()`` in the interactive context.

The ``push`` and ``pull`` methods can both be used with an additional
``label`` argument, which is useful for enabling reuse of a single
Notebook with different parameters. For example, if it is desired to
have one schedule per user for a given notebook, the user’s ID could be
used as a label. This will ensure that two distinct users can schedule
the same notebook, possibly with distinct parameters created from a
separate notebook or from another application, without unscheduling the
other user’s jobs.

Another use for a label would be enabling the scheduling of a single
notebook from different Workbench Analyses using an Add-on Tool. In this
case, a convenient label would be an encoding of the Workbook and
Worksheet IDs of the origin worksheet, e.g.,
``workbookId=77953A64-0675-47AE-826F-DEE1FD7AB4C5&worksheetId=5C83DF79-D725-4756-BBE6-4D2D1525D4FF``.